important words readme
========================

important_words.txt is built from wikistats, word frequency stats and common child words

built by make_important_words.py

format:

word wikinr freqnr childpos totalnr
  0    1       2      3      4       

where 
word: wikipedia word if available, unchanged from wikistats (normally starts with uppercase)
      else a word from frequency file or child file (starts with lowercase)
wikinr: normalized count of occurrences per day (calced from one day in 2016 and one in 2019)
freqnr: occurrence count of a stem in a word freq list
childpos: is present in a child words list (1) or not (0)
total: wikinr+10*freqnr+10000*childpos

all chars should be ascii, whitespaces as separator

data sources:

Wikipedia/wikistats.txt 

Wordfreq/childwords.txt: from https://www.readingrockets.org/article/basic-spelling-vocabulary-list

Wordfreq/ANC-all-count.txt: from http://www.anc.org/data/anc-second-release/frequency-data/
using Written & Spoken


